
Oh hello, I didn’t expect to see you here. This article is currently work in progress, I’ll remove this callout when it’s done.
I’ve been a user of Spotify now since 2017 and during that time I’ve created a myriad of playlists, most of which I’m willing to admit I haven’t touched in a long time. Generally speaking, my playlists fall into one of four categories.
First, and most predominantly, is based on time. I create bimonthly playlists in which I add songs I’ve enjoyed throughout the two months, at the end of the year I’ll compile all 6 of those into a yearly round-up playlist. Second is based on music genre, I have playlists for each genre I listen to (of course there’s bound to be a bit of overlap). Third is based on language, I listen to a mixture of English, Japanese, Korean, and Cantonese songs. Fourth, and finally, are playlists created from song radios or other Spotify-generated means.
One thing that you might not know about Spotify is that it has an API that can be used by developers to create Spotify apps. It’s called the Spotify Web API and it allows you to control audio playback, manage your Spotify library, get metadata of tracks, artists, and albums, and so much more. I’m utilising the Spotipy Python library to use it. For my purposes I’ll be using it to get my playlists’ metadata and the audio features of tracks.
In this blog we’ll be going on a journey that explores my playlists in a data-driven way and eventually produce a machine learning algorithm that gives the most similar tracks in my playlists when provided with a track. All plots, where possible, will be made using Plotly (which is natively interactively) so hover over them, click on them, and drag around on them - see what happens!
Data Overview
I’ve already done the hard work of collecting the data for each track within my playlists as well as sourcing external data used to determine the genre(s) of tracks. In total, there are 7,842 rows and 32 columns. I’ve shown below a small extract of the data, with the most important columns, to get an idea of what we’re working with here:
| name | artist | popularity | playlist_name | playlist_date_added | danceability | energy | loudness | speechiness | acousticness | instrumentalness | liveness | valence | tempo | lang_jap | lang_kor | lang_can | lang_eng | pop | rock | hip_hop | indie | rap | alternative | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4249 | Little Too Late | Camino 84 | 11 | January & February 2020 | 2020-01-11 | 0.746 | 0.520 | -7.513 | 0.035 | 0.091 | 0.000 | 0.143 | 0.656 | 119.961 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3912 | Kiss Me More (feat. SZA) | Doja Cat | 83 | August & September 2021 | 2021-08-05 | 0.518 | 0.590 | -7.024 | 0.027 | 0.044 | 0.000 | 0.435 | 0.413 | 107.870 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| 7257 | etoile et toi | 物語シリーズ | 42 | July & August 2019 | 2019-07-12 | 0.596 | 0.444 | -11.000 | 0.027 | 0.694 | 0.621 | 0.107 | 0.189 | 100.006 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5917 | Never Wanna Fall in Love With U | nelward | 54 | September & October 2020 | 2020-09-29 | 0.924 | 0.592 | -6.898 | 0.045 | 0.028 | 0.000 | 0.138 | 0.968 | 93.000 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1078 | Dream Eater | Kill Bill: The Rapper | 46 | Rap/HipHop/Trap | 2019-04-07 | 0.562 | 0.714 | -7.936 | 0.180 | 0.539 | 0.000 | 0.703 | 0.509 | 151.974 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
Data Dictionary
You might be looking at some of those column names with no idea what they mean or represent, luckily Spotify does provide explanations for their audio features. And even luckier for you, I’ve created a data dictionary that describes what each column represents. Note, the language column was created used my language playlists and the genre column was created using data from Every Noise at Once.
| Column | Category | Description |
|---|---|---|
| name | Track Property | Name of the track. |
| artist | Track Property | Name of the artist. |
| popularity | Artist Property | The popularity of the artist. The value will be between 0 and 100, with 100 being the most popular. The artist’s popularity is calculated from the popularity of all the artist’s tracks. |
| playlist_name | Track Property | Name of playlist in which this track resides. |
| playlist_date_added | Track Property | Date and time when the track was added to playlist. |
| danceability | Mood | Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. |
| energy | Mood | Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. |
| loudness | Track Property | The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db. |
| speechiness | Track Property | Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. |
| acousticness | Context | A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic. |
| instrumentalness | Track Property | Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. |
| liveness | Context | Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live. |
| valence | Mood | A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). |
| tempo | Mood | The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. |
| lang_jap | Language | Whether the track features Japanese language, binary value of 0 or 1. |
| lang_kor | Language | Whether the track features Korean language, binary value of 0 or 1. |
| lang_can | Language | Whether the track features Cantonese language, binary value of 0 or 1. |
| lang_eng | Language | If a track doesn’t feature Japanese, Korean, or Cantonese it’s assumed to be English. Binary value of 0 or 1. |
| pop | Genre | Binary value that describes whether a track’s artist is labelled as pop. |
| rock | Genre | Binary value that describes whether a track’s artist is labelled as rock. |
| hip_hop | Genre | Binary value that describes whether a track’s artist is labelled as hip-hop. |
| indie | Genre | Binary value that describes whether a track’s artist is labelled as indie. |
| rap | Genre | Binary value that describes whether a track’s artist is labelled as rap. |
| alternative | Genre | Binary value that describes whether a track’s artist is labelled as alternative. |
Exploratory Data Analysis
In this section we’ll begin to explore the data and extract out some insights that will help us to reach our eventual goal of identifying similar songs across playlists.
Starting Simple with Univariate Analysis
It’s always good to start with simple descriptive analysis to get a feel for the data. Let’s first start by looking at how many tracks are in each playlist.
Clearly 2019 was a good year for music with 973 tracks being added to the ‘2019 Complete Round Up’ throughout the year. That’s 2.7 tracks per day on average! I’m quite picky with the tracks I’ll add to my bimonthly playists, I’d give an estimate of 1 track being added for every 15 tracks listened to - going by that I was listening to about 41 new tracks every day. For some context, 2019 was when I was finishing my master’s degree in Data Science during which I was listening to music while studying.
Let’s take a look at the distribution of the features in the dataset.
Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
The median value (value in the middle when sorted in ascending order) is 0.636 and the middle half of the data (quartile 1 to 3) is between 0.515 and 0.726. Going by the above definition of danceability given by Spotify, I’m inclined to say that 75% (quartile 2, 3, and 4) of my tracks are pretty danceable! The distribution also looks to have a bit of a left skew.
Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.
The median value is 0.638 and the middle half of the data (quartile 1 to 3) is between 0.488 and 0.791. If you were to pick a random song from all of the above playlists pooled together, there’s a good chance it’s got a decent level of energy (decent being defined as 0.5 and above). Interestingly, the distribution seems as if it’s been right censored due to the upper bound of 1.
Valence is a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. High valence means more positive sounding (happy, cheerful, etc.) while low valence sounds more negative (sad, angry, etc.)
This is an interesting distribution, the median lies almost exactly in the middle with a value of 0.4995. The middle half of the data has lies between 0.333 and 0.687 which pretty much covers the middle third of the bounds. In essence, this is telling me that half my tracks are neutral, one quarter are sad/angry/depressed sounding (have a value of below 0.333), and one quarter are happy/cheerful/euphoric sounding (have a value of above 0.687).
This feature represents the overall estimated tempo of a track in beats per minute (BPM).
The median BPM of tracks is about 120 BPM with 50% of tracks being between 98 and 140 BPM. This almost exactly aligns with an article from MasterClass that states: “Most of today’s popular songs are written in a tempo range of 100 to 140 BPM”.
Haven’t got too much to say about this, it makes sense that there will be a lower-bound BPM threshold that most tracks will be above and conversely an uppder-bound threshold that most tracks will be below. I’d imagine this distribution would drastically change if someone were to exclusively listen to EDM, Techno, or similar high-tempo genres.
The overall loudness of a track in decibels (dB). This looks like a textbook example of a left (or negative) skewed distribution. I don’t know enough about decibels in the context of music mixing and mastering (or any context for that matter) to meaningfully comment on this.
Here we’ve got the distribution of track durations in seconds. This looks like a normal-ish distribution with a long tail on the right (we can test to see how close it is to a normal distribution using a Q-Q plot but I’ll skip that here).
What we can get from this is that the median track length is 3 minutes and 45 seconds and half of the tracks have a duration between 3 minutes and 11 seconds and 4 minutes and 19 seconds. I’ve got one track on the very far right that has a duration of 13 minutes and 44 seconds.
Spotify has a measure for the populary of an artist which will be between 0 and 100, with 100 being the most popular. The artist’s popularity is calculated from the popularity of all the artist’s tracks.
I’ve de-duplicated the data based on artists so we’ll have one row/track per artist to see the distribution of artist popularity. Not-so-surprisingly about 33% of the artists I listen to have a popularity value of 0. Those look to be artists/bands with small followings.
artist_df = df.drop_duplicates(subset='artist')
artist_df[artist_df['popularity']==0][['lang_jap','lang_kor','lang_can','lang_eng', 'indie']].sum(axis=0)lang_jap 137
lang_kor 74
lang_can 3
lang_eng 383
indie 144
dtype: int64
Genre is a bit of a funny one since it’s not representing the genre(s) of the track, but rather the genre(s) of the track’s artist. While this isn’t perfect, it does act as a good enough proxy for our purposes. Note that a track’s artist can be assigned more than one genre.
At first glance it does seem like Pop is the most common genre but I think it needs to be remembered that Pop is probably the most paired/combined genre (e.g. Pop Rock or Indie Pop).
Unfortunately Spotify’s API doesn’t provide any metadata on the language of a track. Luckily I have language based playlists, and from that I can say any tracks that aren’t in those playlists are English tracks. That probably works for let’s say 85-90% of them.
Moving to Two Dimensions with Bivariate Analysis
Correlation Analysis
While we managed to learn about the features individually in the analysis above, we didn’t touch on the relationship between two features (if there is even a relationship). One way we can compactly visualise whether pairings of features have a relationship is to construct an N×N correlation matrix where N is the number of features.
There are several different measures that can be used to calculate the correlation between two variables, the de facto standard is Pearson’s correlation coefficient which measures the strength and direction of the linear relationship between two continuous variables. Another measure is Spearman’s rank correlation coefficient which is a nonparametric measure of rank correlation; in other words, it measures the correlation between the rankings of two variables (a good visual example of this is on its Wikipedia page). Unlike Pearson’s r, Spearmans’ ρ can be used to assess monotonic relationships (linear relationships are a subset of these) for both continuous and discrete ordinal (ranked order) variables.
Very recently, in March 2019, a new measure of correlation was unveiled that is able both capture non-linear dependencies and work consistently between categorical, ordinal, and continuous variables. It is named 𝜙K (PhiK) and the technical details, alongside a few practical examples, are found in its paper. Unlike both Pearson’s r and Spearman’s ρ, 𝜙K is bounded between 0 and 1 which tells us the strength but not the direction of a relationship. This is because direction isn’t well-defined for a non-monotonic relationship, as you’ll see below.
I’ve created a graph below to compare the three correlation coefficients on synthetic datasets (which is based on the first image from Wikipedia’s page on Correlation). As you can see, the third and fourth row of charts are all non-linear relationships between x and y which, as expected, result in a correlation coefficient value of zero for both Pearson’s r and Spearman’s ρ. On the other hand, 𝜙K is able to identify these relationships which are statistically significant at the 0.1% level. Exciting stuff!
For a sample of data from the population, r is used instead of ρ and is referred to as the sample Pearson correlation coefficient.
Armed with these tools, let’s go ahead and use them to create correlation matrices for the Spotify audio features.
TEST
TEST
TEST
TEST
TEST
TEST
TEST
fig = px.scatter_3d(
df.drop_duplicates(subset=['id']),
x='danceability',
y='energy',
z='valence',
color='tempo',
width=800,
height=800
)
fig.update_traces(
marker=dict(size=2),
selector=dict(mode='markers')
)
fig.show()# Initalise array of zeros to store results in
score_arr = np.zeros((len(genres), len(genres)))
# Calculate Jaccard similarity score for pairs of genres of tracks
# Tells us the proportion of co-occurance of genres for tracks are classified either or both genres
for idx1, genre1 in enumerate(genres):
for idx2, genre2 in enumerate(genres):
score_arr[idx1][idx2] = jaccard_score(df[genre1] == 1, df[genre2] == 1)
# Plot results
plt.figure(figsize=(12,10))
sns.heatmap(
data=score_arr,
cmap='Blues',
robust=True,
square=True,
linewidths=0.5,
annot=True,
fmt='.2f',
annot_kws={'fontsize': 12},
cbar=True,
cbar_kws={'shrink': 0.95},
xticklabels=[' '.join(g.split('_')).title() for g in genres],
yticklabels=[' '.join(g.split('_')).title() for g in genres],
)
plt.title('Jaccard similarity coefficient of genres', fontsize=18)
plt.xlabel('Genre', fontsize=15)
plt.ylabel('Genre', fontsize=15)
plt.xticks(fontsize=13)
plt.yticks(fontsize=13)
plt.tight_layout()
save_path = f'{FIG_SAVE_PATH}/jaccard_similary_coeffs_genres.png'
if not path.exists(save_path):
plt.savefig(save_path, bbox_inches='tight')
plt.show()
# Let's find out 'correlation' between combined genres (e.g. tracks that are assigned both pop and rock genres) with single genres (e.g. indie)
# First is to get the combined genres we're interested in
# We don't want permutations, i.e. we treat pop & rock and rock & pop as the same thing
# We also want to only get combined genres that have a similarity score of > 0.1 for the two individual genres
rec_set = set()
for genre1 in genres:
for genre2 in genres:
if genre1 == genre2:
continue
elif (f"{genre1},{genre2}" in rec_set) or (f"{genre2},{genre1}" in rec_set):
continue
else:
if jaccard_score(df[genre1] == 1, df[genre2] == 1) < 0.1:
continue
else:
rec_set.add(f"{genre1},{genre2}")
# Initalise array of zeros to store results in
score_arr = np.zeros((len(rec_set), len(genres)))
# Calculate Jaccard similarity score for combined genres and individual genres of tracks
for idx1, comb_genre in enumerate(list(rec_set)):
print(comb_genre)
comb_genre1, comb_genre2 = comb_genre.split(',')
for idx2, genre in enumerate(genres):
score_arr[idx1][idx2] = jaccard_score(((df[comb_genre1] == 1) & (df[comb_genre2] == 1)), df[genre] == 1)
# Plot results
plt.figure(figsize=(12,10))
sns.heatmap(
data=score_arr.T,
cmap='Blues',
robust=True,
square=True,
linewidths=0.5,
annot=True,
fmt='.2f',
annot_kws={'fontsize': 12},
cbar=True,
cbar_kws={'shrink': 0.95},
xticklabels=[' '.join(' & '.join(g.split(',')).split('_')).title() for g in list(rec_set)],
yticklabels=[' '.join(g.split('_')).title() for g in genres],
)
plt.title('Jaccard similarity coefficient of combined genres with single genres', fontsize=17)
plt.xlabel('Genre', fontsize=15)
plt.ylabel('Genre', fontsize=15)
plt.xticks(fontsize=13,rotation=15)
plt.yticks(fontsize=13)
plt.tight_layout()
save_path = f'{FIG_SAVE_PATH}/jaccard_similary_coeffs_combined_genres.png'
if not path.exists(save_path):
plt.savefig(save_path, bbox_inches='tight')
plt.show()pop,indie
rock,indie
hip_hop,rap
rock,alternative
indie,alternative
pop,rock

nn_feat_cols = [
'danceability',
'energy',
'loudness',
# 'speechiness',
'acousticness',
# 'instrumentalness',
# 'liveness',
'valence',
'tempo',
# 'key',
# 'time_signature',
'lang_eng',
'lang_kor',
'lang_jap',
'lang_can',
'pop',
'rock',
'hip_hop',
'indie',
'rap',
'alternative',
]
# Manually set weightings on columns to bias them, higher weights puts higher bias on a feature
WEIGHTING = False
weighting = {
'danceability': 1,
'energy': 1,
'loudness': 1,
'valence': 1,
'instrumentalness': 1,
'acousticness': 1,
'tempo': 1,
'key': 1,
'time_signature': 1,
'lang_eng': 1,
'lang_kor': 1,
'lang_jap': 1,
'lang_can': 1,
'pop': 1,
'rock': 1,
'hip_hop': 1,
'indie': 1,
'rap': 1,
'alternative': 1,
}
## Remove duplicate songs in a way that retains the most information in terms of playlists the songs belong to
# Mask for genre or language playlists
mask = (df['playlist_name'].isin([name for name in df['playlist_name'].unique() if ('Complete' not in name) and ('&' not in name) and ('playlist' not in name)]))
# Separate dataframe into one dataframe for playlists which are time-based (i.e. yearly or bi-monthly) and another dataframe for playlists which are genre or language based
# Drop duplicate songs in both dataframes
dupe_cols = [col for col in df.columns if col not in ['playlist_name', 'playlist_date_added', 'playlist_track_id']]
dedup_lang_genre_df = df[mask].sort_values(by=['playlist_name']).drop_duplicates(subset=dupe_cols, keep='first')
dedupe_time_df = df[~mask].sort_values(by=['playlist_name']).drop_duplicates(subset=dupe_cols, keep='first')
# Concatenate both into one dataframe with the genre/lang dataframe being at the top
# Then do a final drop duplicates so we only keep duplicate songs from genre/lang dataframe
dedupe_df = pd.concat([dedup_lang_genre_df, dedupe_time_df]).drop_duplicates(subset=dupe_cols, keep='first')
df_weighted = dedupe_df.copy().drop_duplicates(subset=nn_feat_cols).drop_duplicates(subset=['name']).reset_index(drop=True)
# Scale features
for col in ['tempo', 'loudness', 'time_signature', 'key']:
if col in nn_feat_cols:
df_weighted[col] = minmax_scale(df_weighted[col].values)
# Weight features if required
if WEIGHTING == True:
for col in nn_feat_cols:
df_weighted[col] = weighting[col] * np.abs(df_weighted[col])df_weighted['playlist_name'].value_counts()Japanese 861
Indie Alt 685
2018 Complete Round Up 297
Korean 272
2017 Complete Round Up 270
Chill 234
Pop 234
Rap/HipHop/Trap 205
2021 Complete Round Up 86
2019 Complete Round Up 83
Cantonese 78
2020 Complete Round Up 66
Crème de la Crème 38
2022 Complete Round Up 38
Mood/Instrumental 30
January & February 2020 23
November & December 2019 16
Upbeat 14
May & June 2019 10
August & September 2021 10
March & April 2019 10
September & October 2019 9
January & February 2022 9
July & August 2019 9
January & February 2019 9
May & June 2022 5
Special Select J-indie playlist 3
March & April 2020 1
Name: playlist_name, dtype: int64
col_lists = [
[
'danceability',
'energy',
'loudness',
'acousticness',
'valence',
'tempo'
],
[
'lang_eng',
'lang_kor',
'lang_jap',
'lang_can'
],
[
'pop',
'rock',
'hip_hop',
'indie',
'rap',
'alternative'
],
]
title_map = {
0: 'UMAP embedding created using audio track features',
1: 'UMAP embedding created using audio track features and language indicator',
2: 'UMAP embedding created using audio track features and language and genre indicators',
}
for idx, lst in enumerate(col_lists):
cols = [item for lst in col_lists[:idx+1] for item in lst]
# DR using UMAP for 2d vis of data
reducer = umap.UMAP(n_neighbors=6, random_state=42)
embedding = reducer.fit_transform(df_weighted[cols])
embedding_df = pd.DataFrame(embedding, columns=['dim_1', 'dim_2']).join(df_weighted[['playlist_name']])
fig, ax = plt.subplots(figsize=(18,12))
mask = embedding_df['playlist_name'].isin(['Korean', 'Japanese', 'Cantonese', 'Indie Alt', 'Pop', 'Rap/HipHop/Trap', 'Chill', 'Mood/Instrumental', 'Upbeat'])
sns.scatterplot(
x=embedding_df[mask]['dim_1'],
y=embedding_df[mask]['dim_2'],
s=50,
alpha=0.9,
ax=ax,
hue=embedding_df[mask]['playlist_name']
)
plt.xlabel('Embedding dimension 1', fontsize=15)
plt.ylabel('Embedding dimension 2', fontsize=15)
plt.title(f'{title_map[idx]}', fontsize=18)
plt.legend(title='Playlist name', title_fontsize=15, fontsize=14, loc='lower right')
plt.tight_layout()
plt.show()


# DR using UMAP for 2d vis of data
reducer = umap.UMAP(n_neighbors=6, random_state=234)
embedding = reducer.fit_transform(df_weighted[nn_feat_cols])
embedding_df = pd.DataFrame(embedding, columns=['dim_1', 'dim_2']).join(df_weighted[['playlist_name','lang_jap','lang_kor','lang_can','lang_eng','pop','rock','hip_hop','indie','rap','rnb','alternative']])title_map = {
'lang_jap': 'Japanese',
'lang_kor': 'Korean',
'lang_can': 'Cantonese',
'lang_eng': 'English',
'pop': 'Pop',
'rock': 'Rock',
'hip_hop': 'Hip Hop',
'indie': 'Indie',
'rap': 'Rap',
'alternative': 'Alternative',
}
plt.figure(figsize=(35,25))
for idx, genre in enumerate(title_map, 1):
plt.subplot(3,4,idx)
sns.scatterplot(
x=embedding_df['dim_1'],
y=embedding_df['dim_2'],
s=40,
alpha=0.5,
hue=embedding_df[genre]
)
plt.xlabel('Embedding Dimension 1', fontsize=13)
plt.ylabel('Embedding Dimension 2', fontsize=13)
plt.title(f'UMAP embedding by {title_map[genre]} attribute', fontsize=15)
plt.legend(title=title_map[genre], title_fontsize=13, fontsize=12, loc='upper left')
plt.suptitle('Track attributes of UMAP embedding projection', fontsize=19)
plt.tight_layout(rect=[0, 0, 1, 0.98])
plt.show()
# DR using UMAP for 3d vis of data
reducer = umap.UMAP(n_neighbors=30, n_components=3)
embedding = reducer.fit_transform(df_weighted[nn_feat_cols])
fig = px.scatter_3d(
x=embedding[:, 0],
y=embedding[:, 1],
z=embedding[:, 2],
width=800,
height=800
)
fig.update_traces(
marker=dict(size=2),
selector=dict(mode='markers')
)
fig.show()Unable to display output for mime type(s): application/vnd.plotly.v1+json